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Abstract 

Background/Aims: Many cognitive screening instruments (CSI) are available to clinicians to 
assess cognitive function. The optimal method comparing the diagnostic utility of such tests 
is uncertain. The effect size (Cohen's d), calculated as the difference of the means of two 
groups divided by the weighted pooled standard deviations of these groups, may permit such 
comparisons. Methods: Datasets from five pragmatic diagnostic accuracy studies, which ex- 
amined the Mini-Mental State Examination (MMSE), the Mini-Mental Parkinson (MMP), the 
Six-Item Cognitive Impairment Test (6CIT), the Montreal Cognitive Assessment (MoCA), the 
Test Your Memory test (TYM), and the Addenbrooke's Cognitive Examination-Revised (ACE-R), 
were analysed to calculate the effect size (Cohen's d) for the diagnosis of dementia versus no 
dementia and for the diagnosis of mild cognitive impairment versus no dementia (subjective 
memory impairment). Results: The effect sizes for dementia versus no dementia diagnosis 
were large for all six CSI examined (range 1.59-1.87). For the diagnosis of mild cognitive im- 
pairment versus no dementia, the effect sizes ranged from medium to large (range 0.48-1.45), 
with MoCA having the largest effect size. Conclusion: The calculation of the effect size (Co- 
hen's d) in diagnostic accuracy studies is straightforward. The routine incorporation of effect 
size calculations into diagnostic accuracy studies merits consideration in order to facilitate the 
comparison of the relative value of CSI. © 2014 s. Karger ag, Basel 
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Introduction 



The utility of cognitive screening instruments (CSI) for the diagnosis of dementia and 
lesser degrees of cognitive impairment may be indicated by a number of summary param- 
eters, of which the most familiar are probably sensitivity and specificity. Predictive values, 
likelihood ratios, clinical utility indexes, agreement between tests [kappa statistic), and the 
area under the receiver operating characteristic curve [AUG ROC] may also be used as param- 
eters of diagnostic utility [1], but all these measures have potential shortcomings. For example, 
sensitivity and specificity may be difficult to apply to individual patients, predictive values 
are influenced by the prevalence of the disease in the population being tested, and AUG ROC 
combines the test accuracy over a range of thresholds, which may be both clinically relevant 
and clinically nonsensical. 

Another metric that may be used to demonstrate utility is the effect size. The effect size 
may be denoted by a variety of summary indices, of which Cohen's d is probably the most 
commonly used in the medical literature [2]. This parameter is calculated as the difference of 
the means of two groups divided by the weighted pooled standard deviations of these groups 
[fig. 1). Cohen [3] suggested that effect sizes of 0.2-0.3 were small, 0.5 medium, and >0.8 large. 

One example of the potential utility of this approach in clinical studies of cognitive 
impairment was demonstrated by Br0nnick [4] who compared standardized effect sizes of 
cognitive functions between groups of patients diagnosed with Parkinson's disease, Parkin- 
son's disease dementia, Alzheimer's disease, and normal controls to identify differences in 
group mean values [Cohen's d). This study showed larger effect sizes for tests of memory in 
Alzheimer's disease and of executive and visuospatial function in Parkinson's disease 
dementia, indicating greater impairments in these domains. Hence, testing of these selected 
cognitive functions may be of particular utility for the differential diagnosis of these condi- 
tions. 

The aim of the study presented here was to calculate the Cohen's d metric from the 
datasets of a number of pragmatic prospective diagnostic accuracy studies undertaken in 
dedicated secondary care memory clinics to calculate effect sizes for several CSI, specifically 
the Mini-Mental State Examination [MMSE) [5], the Mini-Mental Parkinson [MMP) [6], the 
Six-Item Cognitive Impairment Test [6CIT) [7], the Montreal Cognitive Assessment [MoCA) 
[8], the Test Your Memory [TYM) test [9], and the Addenbrooke's Cognitive Examination- 
Revised [ACE-R) [10]. The calculation of the effect size was undertaken for both the diagnosis 
of dementia versus no dementia and of mild cognitive impairment versus no dementia. 



Materials and Metliods 

Data from five previous pragmatic diagnostic accuracy studies [11-15], which examined 
six different CSI [5-10], namely the MMSE [11], MMP [11], 6CIT [12], MoCA [13], TYM [14], 
and ACE-R [15], were reanalysed. Study details [setting, sample size, dementia prevalence, 
sex ratio, and age range) are shown in table 1. 

In each of these studies, the criterion diagnosis was established by the judgment of an 
experienced clinician based on widely accepted clinical diagnostic criteria. The mean test 
scores for demented and non-demented groups as well as for mild cognitive impairment and 
non-demented groups, along with their standard deviations, were applied to the Cohen's d 
formula [fig. 1) to calculate effect sizes. Because these were clinic-based pragmatic studies, 
there was no normal control group, the non-demented cases consisting of patients with at 
least subjective memory impairment as well as patients with mild cognitive impairment 
insufficient to mandate a dementia diagnosis. 
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Fig. 1. d = Cohen's d effect size; Xi 
and X2 = means of the two groups; 
Si and S2 = standard deviations of 
the two groups. 




Table 1. Study demographics 



CSI 


Setting 


n 


Dementia 


M:F 


Age range, 


Ref. 








prevalence, % 


(% male] 


years 




MMSE, MMP 


Cognitive function chnic 


225 


21 


130:95 [58] 


20-86 (median 62] 


11 


6CIT 


Cognitive function chnic 


186 


19 


96:90 (52] 


16-94 (median 59] 


12 


IVloCA 


Cognitive function chnic 


150 


24 


93:57 (62] 


20-87 (median 61] 


13 


TYM 


Cognitive function chnic and old age 


224 


35 


130:94 (58] 


20-90 (mean 63] 


14 




psychiatry memory chnic 












ACE-R 


Old age psychiatry memory clinic 


183 


40 


105:78 (57] 


37-89 (median 67] 


15 



Table 2. Cohen's d effect size for 
dementia versus no dementia and 
for mild cognitive impairment 
versus no dementia 



CSI 


Cohen's d: dementia 
vs. no dementia 


Cohen's d: mild cognitive 
impairment vs. no dementia 


IVIIVISE 


1.59 (large] 


0.69 (medium] 


IVIMP 


1.78 (large] 


0.81 (large] 


6CIT 


1.84 (large] 


0.65 (medium] 


IVloCA 


1.80 (large] 


1.45 (large] 


TYIVI 


1.62 (large] 


0.48 (medium] 


ACE-R 


1.87 (large] 


0.73 (medium] 



Results 

The calculation of Cohen's d comparing patients with and without dementia suggested 
large but similar effect sizes for all CSI examined [table 2, left-hand column), using the clas- 
sification suggested by Cohen [2, 3]. These values suggested a consistent difference in test 
scores between demented and non-demented individuals. 

The calculation of Cohen's d comparing patients with mild cognitive impairment and no 
dementia [subjective memory impairment) suggested smaller effect sizes for all CSI examined 
than in the dementia versus no dementia distinction [table 2, right-hand column), using the 
classification suggested by Cohen [2, 3]. However, one effect size, namely that for the MoCA, 
was clearly larger than all the others. These values suggested a consistent difference in test 
scores between MCI and non-demented individuals, but with the MoCA performing best. 

For comparative purposes, other parameters of diagnostic utility derived from the 
sampled diagnostic accuracy studies [11-15] are shown, namely sensitivity, specificity, 
predictive values, likelihood ratios, and AUC ROC [table 3). 
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Table 3. Other diagnostic utility measures for dementia versus no dementia 



CSI 


Overall 
accuracy 


Sens 


Spec 


PPV 


NPV 


LR+ 


LR- 


AUG ROC Ref. 


MMSE 


0.86 


0.45 


0.98 


0.88 


0.85 


22.9 


0.56 


0.87 11 


iVlMP 


0.86 


0.51 


0.97 


0.83 


0.87 


15.7 


0.51 


0.89 11 


6CIT (n = 100] 


0.79 


0.80 


0.79 


0.48 


0.94 


3.8 


0.25 


0.88 12 


iVloCA 


0.81 


0.63 


0.95 


0.91 


0.77 


13.4 


0.39 


0.91 13 


TYM 


0.83 


0.73 


0.88 


0.77 


0.86 


6.3 


0.30 


0.89 14 


ACE-R 


0.87 


0.82 


0.89 


0.75 


0.93 


7.7 


0.20 


N/A 15 



Sens = Sensitivity; Spec = specificity; PPV = positive predictive value; NPV = negative predictive value; 
LR+ = positive likelihood ratio; LR- = negative likelihood ratio; N/A = not available. 



Discussion 

In this study, data from five pragmatic diagnostic accuracy studies of CSI [11-15] were 
re-analysed to calculate effect sizes [Cohen's d) for the diagnosis of dementia versus no 
dementia and for the diagnosis of mild cognitive impairment versus no dementia. These were 
observational studies, which examined unselected patient groups with cognitive complaints 
of unknown aetiology. Such pragmatic diagnostic accuracy studies [16] differ from experi- 
mental studies in which patient groups are selected by known diagnostic categories, often 
with a normal control group, and then have the test or intervention applied [essentially case- 
control studies). Since pragmatic diagnostic accuracy studies reflect the idioms of clinical 
practice in terms of the typical spectrum of patients seen more closely, it may be argued that 
their results are more broadly generalizable [1], although the groupings are more heteroge- 
neous than in experimental studies. 

Making comparisons between studies is, of course, problematic, notwithstanding the 
consistency of study protocols and authorship in the studies examined here. One potential 
shortcoming of the current analysis was the different settings and sample characteristics for 
each of the five studies [table 1). Inevitably, the case-mix in clinics run by neurologists and by 
old age psychiatrists will differ in terms of both patient age and dementia prevalence. One 
outcome of this heterogeneity was insufficient data to compare effect sizes by patient age. 

Overall, the calculations suggested a large effect size for all CSI examined for the diag- 
nosis of dementia versus no dementia [table 2, left-hand column). Since the purpose of these 
instruments is to screen for the diagnosis of dementia, this finding is perhaps not surprising. 

Effect size calculations for the differential diagnosis of mild cognitive impairment versus 
no dementia [subjective memory impairment) gave lower values for all CSI [table 2, right- 
hand column), as might be anticipated for this more challenging clinical distinction. These 
calculations suggested that the MoCA performed best, as might be expected since this 
instrument was specifically designed to detect mild cognitive impairment [9] . This distinction 
between the performance of different CSI may be of clinical importance when disease-modi- 
fying drugs for cognitive impairment are developed. 

How do effect sizes relate to other diagnostic utility measures [table 3)? As a global 
measure, encompassed in a single outcome number compared to criterion values [3], effect 
sizes are perhaps akin to AUC ROC, which likewise differed little between the various CSI 
examined in this study [table 3). The effect size thus gives an overall index of test diagnostic 
utility, but obviously this metric may conceal differences between tests in terms of, for 
example, their relative sensitivity [range 0.45-0.82) and specificity [range 0.79-0.98), 
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although these parameters are obviously dependent on the chosen test cutoff. Thus, effect 
sizes may need to be used in conjunction with other summary test metrics to decide which 
might be most appropriate for the given clinical situation, for example, whether the clinician 
seeks high sensitivity or specificity. 

The calculated effect sizes for the CSl examined in this study permit some comparison 
between instruments, although other summary measures and analyses are also available for 
this purpose. Meta-analysis is perhaps the most favoured approach for its methodological 
rigor, but the outcomes are dependent on the stringency of study inclusion and exclusion 
criteria, which may sometimes give results that are not anticipated [e.g. comparability of the 
MMSE and ACE/ACE-R, due to a surprisingly high sensitivity of the MMSE in included studies 
[17]). The test of agreement between tests [kappa statistic) can be used to measure whether 
agreement between tests is perfect [kappa = 1) or due to chance alone [kappa = 0) [18]. The 
AUG ROC is a measure of diagnostic accuracy but has been criticised for combining test 
accuracy over a range of thresholds, which may be both clinically relevant and clinically non- 
sensical [19]. Weighted comparison may be used to indicate the net benefit of one test with 
respect to another and permits the calculation of the equivalent increase in the number of 
true positive patients identified per 1,000 patients tested [19]. One such weighted comparison 
study suggested the superiority of the ACE-R and MoCA over the MMSE and the inferiority of 
TYM to MMSE [20]. 

As shown in this study, the calculation of effect size [Cohen's d) for CSl is straightforward. 
No previous analyses of the comparative diagnostic utility of CSl using this method have been 
identified. This study suggests that there is a case for the routine incorporation of effect sizes 
[it should be noted that there are effect size formulae other than Cohen's d [21]) as a measure 
of diagnostic test performance into diagnostic accuracy studies. Although this is not an explicit 
recommendation of the STARDdem guidelines [Standards for the Reporting of Diagnostic 
Accuracy studies specific to diagnostic accuracy studies in dementia; www.starddem.org), 
summary measures such as Cohen's d and weighted comparison might be considered in 
future iterations of these guidelines. 
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